Question 1 (Show main steps of your work to get full points)

The \((X'X)^{-1}\) for the \(y=β_0+β_1 x_1+β_2 x_2+β_3 x_3+β_4 x_4+β_5 x_5+β_6 x_6+ε\) is given below.

  1. If MSE = 1.395 and n = 38 , compute the (Keep 4 or more decimal places, DO NOT round in the intermediate steps)

  2. se(β ̂_4)

\[se(\mathbf{\hat\beta_4})=\sqrt{MSE\times C_{55}}=\sqrt{1.395\times0.069}=0.3102499\]

  1. Cov(β ̂_2,β ̂_4)

\[Cov(\mathbf{\hat\beta_2,\hat\beta_4})=MSE\times C_{35}=1.395\times(-0.035)=-0.048825\]

  1. Cor(β ̂_2,β ̂_4 )

\[se(\mathbf{\hat\beta_2})=\sqrt{MSE\times C_{33}}=\sqrt{1.395\times0.067}=0.3057205\]

\[Cor(\mathbf{\hat\beta_2,\hat\beta_4})=\frac{Cov(\mathbf{\hat\beta_2,\hat\beta_4})}{se(\mathbf{\hat\beta_2})se(\mathbf{\hat\beta_4})}=\frac{-0.048825}{0.3057205\times0.3102499}=-0.5147615\]

  1. Without computing anything, explain which estimator is the most consistent.

\(C_{66}=0.058\) has the smallest value. \(\hatβ_5\) has the the least variance and the most consistent among the estimators.

  1. Without computing anything , list the pair(s) of estimators that are positively correlated. Provide a reason.

According to the \((X'X)^{(-1)}\),

\(C_{13},\ C_{17},\ C_{24},\ C_{25},\ C_{67}\) are positive.

The positively correlated pairs of parameters are

\(\hatβ_0\) and \(\hatβ_2\), \(\hatβ_0\) and \(\hatβ_6\), \(\hatβ_1\) and \(\hatβ_3\), \(\hatβ_1\) and \(\hatβ_4\), \(\hatβ_5\) and \(\hatβ_6\).

  1. Consider the following hypothesis: \(H_0: β_1=2β_3,β_2=β_3,β_5=0\)

  2. Report the T matrix, β vector and c vector along with their dimensions, and the rank of T matrix for testing the above hypothesis.

\[ \mathbf{T}=\begin{bmatrix} 0 & 1 & 0 & -2 & 0 & 0& 0 \\ 0 & 0 & 1 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}_{3\times7} \mathbf{β}=\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \beta_6 \end{bmatrix}_{7\times1} \mathbf{C}=\begin{bmatrix} 0 \\ 0 \\ 0\end{bmatrix}_{3\times1} rank(T)=3 \]

  1. Report the values of numerator and denominator degrees of freedom for the corresponding F test. For F test, the numerator is MSR while denominator is MSE, thus

In this hypothesis,\(y=β_0+2β_3x_1+β_3x_2+β_3x_3+β_4x_4+0x_5+β_6x_6+ε=β_0+β_3(2x_1+x_2+x_3)+β_4x_4+β_6x_6+ε\)

The value of numerator is \(r=df_{Reduced}-df_{Full}=n-(3+1)-[n-(6+1)]=3\)

The denominator degrees of freedom is \(df_{Full}=n-(k+1)=38-(6+1)=31\)

  1. Show the following equation is an alternative form of the sum of squares of regreesion or model (SSR).

\[SSR=\sum_{i=1}^n(\hat y_i-\bar y)^2=\sum_{i=1}^n(\hat y_i^2-2\hat y_i\bar y+\bar y^2)=\sum_{i=1}^n\hat y_i^2-2\bar y\sum_{i=1}^n\hat y_i+\sum_{i=1}^n\bar y^2\]

\[=\sum_{i=1}^n\hat y_i^2-2\bar yn\frac{\sum_{i=1}^n\hat y_i}n+n\bar y^2=\sum_{i=1}^n\hat y_i^2-2\bar yn\bar y+n\bar y^2=\sum_{i=1}^n\hat y_i^2-n\bar y^2\]

Question 2 (Use software to analyze the given data)

The data in the WaterFlow file are simulated data on peak rate of flow (in cfs) of water from six watersheds following storm episodes. The predictors are:

x1 : Area of watershed (mi2) x2 : Area impervious to water (mi2)
x3 : Average slope of watershed (percent)
x4 : Longest stream flow in watershed (1000s of feet)
x5 : surface absorbency index, (0= complete absorbency, 100=no absorbency)
x6 : estimated soil storage capacity (inches of water)
x7 : Infiltration rate of water into soil (inches/hour)
x8 : Rainfall (inches)
x9 : Time period during which rainfall exceeded ¼ inch/hour

  1. Create the matrix of scatterplots and compute the correlation matrix for all the variables. Copy and paste them here.


  1. Based on scatterplots and correlation, explain which predictors are significantly related to (most likely to contribute to the variation in) the response variable.

X2,X7,X1,X4 have medium to strong positive linear relationship to the response variable (more than 0.6). X5 have medium negative linear relationship to the response variable.

The scatterplots of y versus x1, x3, and x4 show a medium to strong positive linear relationship and it is confirmed by the correlation coefficients (more than 0.6). Predictor x? has strongest negative linear relationship with y.


  1. Fit the full model.

  2. Explain whether the overall model is significant at 5% significance level.

The fitted model is statistically significant at 5% significance level (p value =0.0000 )


  1. Explain whether assumptions of random errors and model are satisfied. If there is a violation of those, then suggest reasonable methods to correct them.
  • Residual Diagnostics:

Includes plots to examine residuals to validate OLS assumptions

There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).

  • Variable selection:

Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression

  • Heteroskedasticity:

Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test

  • Measures of influence:

Use different plots to detect and identify influential observations


  1. How much of the sum of squares is explained by rainfall, given that all the other regression coefficients are in the model?

The partial regression plots do not show nonlinear patterns and hence first-order terms are good enough.


  1. Explain whether there is a problem of multicollinearity.
  • Collinearity diagnostics:

VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables

The model does have serious problems of multicollinearity (VIF of >10).

X4, X1, X3, X7, X5

Predictor X1 is the number of rooms while X4 is the number of bedrooms in a house. A high correlation is expected between these two variables.

There is a problem of multicollinearity.

It will be important to solve multicollinearity first, and hence predictor x6 is the first to remove. However, according to the variable names of x1 and x4, predictor x6 may contain same information contains in x7. The bedrooms are rooms in a house.

Further, according to the correlation coefficients, x7 is less correlated with y than x6 is correlated.

It is better to remove x7 first and check whether the multicollinearity is solved. If it was not solved, then definitely, x6 has to be removed first.


  1. Interpret the estimated coefficient of rainfall predictor of the full model using question context.

Intercept of 10.04 suggests the average sale price (in 1000s dollars) of a house with zero taxes and zero baths. This does not make sense because there cannot be a house without a bathroom.

Coefficient of 5.595 suggests the average sale price of houses increases by $ 5.595 when the tax increases by 1000 and number of baths is a constant.

Coefficient of 1.935 suggests the average sale price of houses increases by $ 1.935 when the number of baths increases by 1 and tax is a constant.


  1. Create a new variable using natural log of response. Then fit the full model using this new variable as response.
  1. Explain whether the overall model is significant at 5% significance level.

The fitted model is statistically significant at 5% significance level (p value = )


  1. Explain whether there is a problem of multicollinearity.

Predictor X1 is the number of rooms while X4 is the number of bedrooms in a house. A high correlation is expected between these two variables.

There is a problem of multicollinearity.


  1. If you wanted to simplify this full model, explain which predictor you would eliminate first.

If

  • Residual Diagnostics:

Includes plots to examine residuals to validate OLS assumptions

There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).

  • Variable selection:

Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression

  • Heteroskedasticity:

Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test

  • Measures of influence:

Use different plots to detect and identify influential observations

  • Collinearity diagnostics:

VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables

The general approaches for dealing with multicollinearity include collecting additional data, model respecification (redefine the regressors, variable elimination), estimation methods (Ridge Regression, Principal-Component Regression)

“Variable elimination is often a highly effective technique. However, it may not provide a satisfactory solution if the regressors dropped from the model have significant explanatory power relative to the response y. That is, eliminating regressors to reduce multicollinearity may damage the predictive power of the model.” (p.304)


  1. Use the forward selection method to find the best model (use α=0.15) and report the final fitted model with estimated coefficients here.

Stepwise Forward Regression based on p values (use α=0.15)

Stepwise AIC Forwardd Regression

Full model

eliminated model


  1. Use the backward elimination method to find the best model (use α=0.05) and report the final fitted model with estimated coefficients here.

Stepwise Backward Regression based on p values (use α=0.05)

Stepwise AIC Backward Regression

Full model

eliminated model


  1. Use best subsets method (6 models from each size) to find the best model for these data and report the final fitted model with estimated coefficients here.

Full model

eliminated model


  1. If the final models in the previous 3 methods are different, compare their model adequacy and suggest one best model.

Both models do not have a problem of multicollinearity (VIF <10), and violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).

The model with 4 predictors has a slightly higher (about by 2%) adjusted R square compared to the model with only x1 and x2. Further, x5 and x7 predictors are not statistically significant at 10% significance level (p values are 0.11479 and 0.10356, respectively). There is no significant pattern on the plot of studentized residuals versus predicted values from the model with only x1 and x2. The partial regression plots do not show nonlinear patterns and hence first-order terms are good enough.

Finally, the model with 2 predictors is simpler that model with 4 predictors. Therefore, the best model will be

  • Residual Diagnostics:

Includes plots to examine residuals to validate OLS assumptions

There is no violation of assumptions about the errors (no pattern on residual plots and points follow approximately straight line on the qq plot).

Residual QQ Plot Residual Normality Test Residual vs Fitted Values Plot Residual Histogram

  • Variable selection:

Differnt variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression

  • Heteroskedasticity:

Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test

Bartlett Test Breusch Pagan Test Score Test F Test

  • Measures of influence:

Use different plots to detect and identify influential observations

Cook’s D Bar Plot Cook’s D Chart DFBETAs Panel DFFITs Plot Studentized Residual Plot Standardized Residual Chart Studentized Residuals vs Leverage Plot Deleted Studentized Residual vs Fitted Values Plot Hadi Plot Potential Residual Plot


  1. Provide complete ANOVA table for the best model. Provide partial sum of squares, estimated coefficients, standard errors, p-values, 95% Bonferroni joint confidence intervals for the coefficients of the best model. Provide in a tabular form clearly.

  1. How much variation in the response is explained by the best model after taking number of data and regression coefficients in to account?

  1. Report the PRESS statistic of the best model.

About 72.93% of variation in predicting the sale price of houses in Erie, Pennsylvania.


  1. Report the complete code along with output here.

(a) The matrix of scatterplots and the correlation matrix

(c) The fitted full model

# build the model
model_wf_full <- lm(y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, data=table_wf)
ols_regress(model_wf_full)
##                           Model Summary                            
## ------------------------------------------------------------------
## R                       0.906       RMSE                  609.308 
## R-Squared               0.821       Coef. Var              47.188 
## Adj. R-Squared          0.741       MSE                371256.369 
## Pred R-Squared          0.618       MAE                   366.548 
## ------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                  ANOVA                                   
## ------------------------------------------------------------------------
##                     Sum of                                              
##                    Squares        DF    Mean Square      F         Sig. 
## ------------------------------------------------------------------------
## Regression    34143007.990         9    3793667.554    10.218    0.0000 
## Residual       7425127.376        20     371256.369                     
## Total         41568135.367        29                                    
## ------------------------------------------------------------------------
## 
##                                        Parameter Estimates                                        
## -------------------------------------------------------------------------------------------------
##       model        Beta    Std. Error    Std. Beta      t        Sig          lower        upper 
## -------------------------------------------------------------------------------------------------
## (Intercept)     292.561      4428.618                  0.066    0.948     -8945.373     9530.495 
##          X1    -203.144       410.268       -0.472    -0.495    0.626     -1058.947      652.660 
##          X2    1055.782      9833.700        0.028     0.107    0.916    -19456.957    21568.521 
##          X3     -49.240       156.200       -0.167    -0.315    0.756      -375.067      276.588 
##          X4     209.762       162.046        1.258     1.294    0.210      -128.259      547.783 
##          X5     -10.197        51.088       -0.059    -0.200    0.844      -116.764       96.370 
##          X6     -24.558       303.529       -0.012    -0.081    0.936      -657.709      608.592 
##          X7     142.778      3288.443        0.019     0.043    0.966     -6716.793     7002.349 
##          X8     511.713       209.741        0.541     2.440    0.024        74.200      949.226 
##          X9    -301.872       171.996       -0.398    -1.755    0.095      -660.649       56.905 
## -------------------------------------------------------------------------------------------------
model_wf_full%>% summary()
## 
## Call:
## lm(formula = y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, 
##     data = table_wf)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1404.21  -318.77    74.73   266.66  1274.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   292.56    4428.62   0.066   0.9480  
## X1           -203.14     410.27  -0.495   0.6259  
## X2           1055.78    9833.70   0.107   0.9156  
## X3            -49.24     156.20  -0.315   0.7558  
## X4            209.76     162.05   1.294   0.2103  
## X5            -10.20      51.09  -0.200   0.8438  
## X6            -24.56     303.53  -0.081   0.9363  
## X7            142.78    3288.44   0.043   0.9658  
## X8            511.71     209.74   2.440   0.0241 *
## X9           -301.87     172.00  -1.755   0.0945 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 609.3 on 20 degrees of freedom
## Multiple R-squared:  0.8214, Adjusted R-squared:  0.741 
## F-statistic: 10.22 on 9 and 20 DF,  p-value: 9.744e-06
Anova(model_wf_full)
## Anova Table (Type II tests)
## 
## Response: y
##            Sum Sq Df F value  Pr(>F)  
## X1          91022  1  0.2452 0.62589  
## X2           4279  1  0.0115 0.91557  
## X3          36893  1  0.0994 0.75585  
## X4         622091  1  1.6756 0.21025  
## X5          14790  1  0.0398 0.84381  
## X6           2430  1  0.0065 0.93632  
## X7            700  1  0.0019 0.96580  
## X8        2209825  1  5.9523 0.02414 *
## X9        1143622  1  3.0804 0.09455 .
## Residuals 7425127 20                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(c) ii Residual diagnostics

#Model Fit Assessment
ols_plot_diagnostics(model_wf_full)

# Part & Partial Correlations
ols_test_correlation(model_wf_full) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9710713
# Residual Normality Test
ols_test_normality(model_wf_full) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
##        Test             Statistic       pvalue  
## -----------------------------------------------
## Shapiro-Wilk              0.9589         0.2898 
## Kolmogorov-Smirnov        0.1423         0.5314 
## Cramer-von Mises          2.5333         0.0000 
## Anderson-Darling          0.5169         0.1748 
## -----------------------------------------------

(c) iii The partial regression and nonlinear diagnostics

#Lack of Fit F Test
ols_pure_error_anova(lm(y~X8, data = table_wf))
## Lack of Fit F Test 
## ---------------
## Response :   y 
## Predictor:   X8 
## 
##                        Analysis of Variance Table                         
## -------------------------------------------------------------------------
##                 DF      Sum Sq        Mean Sq      F Value       Pr(>F)   
## -------------------------------------------------------------------------
## X8               1     4616882.92    4616882.92    5.795558    0.02290414 
## Residual        28    36951252.44    1319687.59                           
##  Lack of fit    21    31374881.28    1494041.97    1.875466     0.2003839 
##  Pure Error      7     5576371.17     796624.45                           
## -------------------------------------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_full)

# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_full)

(c) iv Collinearity diagnostics

(d) The fitted log model

(d) (2) Collinearity diagnostics

(d) (3) Variable selection

alias(lm(y ~ as.factor(X3) + as.factor(X4) + as.factor(X5) + as.factor(X6) + as.factor(X7), data=table_wf))
## Model :
## y ~ as.factor(X3) + as.factor(X4) + as.factor(X5) + as.factor(X6) + 
##     as.factor(X7)
## 
## Complete :
##                   (Intercept) as.factor(X3)6 as.factor(X3)6.5 as.factor(X3)7 as.factor(X3)15 as.factor(X4)2 as.factor(X5)60 as.factor(X5)65 as.factor(X5)70 as.factor(X6)1
## as.factor(X4)10    0           0              0                0              1               0              0               0               0               0            
## as.factor(X4)15    0           1              0                1              0               0              0               0               0               0            
## as.factor(X4)19    0           0              1                0              0              -1              0               0               0               0            
## as.factor(X5)62    0           1              0                0              0               0              0               0               0               0            
## as.factor(X5)67    0           0              0                1              0               0              0               0               0               0            
## as.factor(X5)68    0           0              0                0              1               1             -1              -1               0               0            
## as.factor(X5)80    1          -1             -1               -1             -1               0              0               0              -1               0            
## as.factor(X6)1.5   0           1              0                0              0               0              0               0               1               0            
## as.factor(X6)2     1          -1              0               -1             -1              -1              1               1              -1              -1            
## as.factor(X7)0.2   0           0              0                0              1               0              0               0               0               0            
## as.factor(X7)0.25  1          -1             -1               -1             -1               0              0               0               0               0            
## as.factor(X7)0.35  0           0              0                0             -1               0              1               1               0               0            
## as.factor(X7)0.5   0           0              1                1              0              -1              0               0               0               0            
## as.factor(X7)0.6   0           1              0                0              0               0              0               0               0               0

(d) (4) Forward selection

Stepwise Forward Regression for full model

# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_full_log, penter = 0.15)
## Forward Selection Method    
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X1 
## 2. X2 
## 3. X3 
## 4. X4 
## 5. X5 
## 6. X6 
## 7. X7 
## 8. X8 
## 9. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered: 
## 
## - X4 
## - X3 
## - X7 
## 
## No more variables to be added.
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.944       RMSE               0.549 
## R-Squared               0.890       Coef. Var          8.618 
## Adj. R-Squared          0.878       MSE                0.301 
## Pred R-Squared          0.854       MAE                0.414 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     63.565         3         21.188    70.378    0.0000 
## Residual        7.828        26          0.301                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                  Parameter Estimates                                  
## -------------------------------------------------------------------------------------
##       model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## -------------------------------------------------------------------------------------
## (Intercept)    2.872         0.547                 5.254    0.000     1.748    3.995 
##          X4    0.122         0.033        0.559    3.730    0.001     0.055    0.189 
##          X3    0.168         0.040        0.435    4.165    0.000     0.085    0.251 
##          X7    3.106         1.537        0.309    2.021    0.054    -0.053    6.266 
## -------------------------------------------------------------------------------------
## 
##                            Selection Summary                             
## ------------------------------------------------------------------------
##         Variable                  Adj.                                      
## Step    Entered     R-Square    R-Square     C(p)        AIC       RMSE     
## ------------------------------------------------------------------------
##    1    X4            0.8030      0.7960    48.8552    68.4060    0.7087    
##    2    X3            0.8731      0.8637    24.2129    57.2082    0.5792    
##    3    X7            0.8904      0.8777    19.6668    54.8305    0.5487    
## ------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_full_log)
## Forward Selection Method 
## ------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X4 
## 5 . X5 
## 6 . X6 
## 7 . X7 
## 8 . X8 
## 9 . X9 
## 
## 
## Variables Entered: 
## 
## - X4 
## - X3 
## - X7 
## - X8 
## - X9 
## - X6 
## 
## No more variables to be added.
## 
##                        Selection Summary                        
## ---------------------------------------------------------------
## Variable      AIC      Sum Sq     RSS       R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## X4           68.406    57.330    14.063    0.80302      0.79599 
## X3           57.208    62.335     9.057    0.87313      0.86373 
## X7           54.830    63.565     7.828    0.89036      0.87771 
## X8           54.522    64.144     7.248    0.89848      0.88223 
## X9           44.504    66.537     4.856    0.93199      0.91782 
## X6           39.161    67.591     3.801    0.94675      0.93286 
## ---------------------------------------------------------------

Stepwise Forward Regression for X4 eliminated model

# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_rm4_log, penter = 0.15)
## Forward Selection Method    
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X1 
## 2. X2 
## 3. X3 
## 4. X5 
## 5. X6 
## 6. X7 
## 7. X8 
## 8. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered: 
## 
## - X1 
## - X3 
## - X7 
## - X6 
## - X8 
## - X9 
## 
## No more variables to be added.
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.971       RMSE               0.421 
## R-Squared               0.943       Coef. Var          6.618 
## Adj. R-Squared          0.928       MSE                0.178 
## Pred R-Squared          0.900       MAE                0.292 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.310         6         11.218    63.195    0.0000 
## Residual        4.083        23          0.178                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.307         0.410                  5.623    0.000     1.458     3.156 
##          X1     0.207         0.053        0.368     3.897    0.001     0.097     0.317 
##          X3     0.263         0.022        0.680    11.944    0.000     0.217     0.308 
##          X7     5.453         1.002        0.542     5.442    0.000     3.380     7.525 
##          X6    -0.532         0.144       -0.192    -3.688    0.001    -0.831    -0.234 
##          X8     0.613         0.137        0.495     4.462    0.000     0.329     0.897 
##          X9    -0.433         0.112       -0.435    -3.864    0.001    -0.665    -0.201 
## ----------------------------------------------------------------------------------------
## 
##                             Selection Summary                             
## -------------------------------------------------------------------------
##         Variable                  Adj.                                       
## Step    Entered     R-Square    R-Square      C(p)        AIC       RMSE     
## -------------------------------------------------------------------------
##    1    X1            0.5266      0.5097    154.8516    94.7131    1.0987    
##    2    X3            0.8121      0.7981     47.7988    68.9988    0.7050    
##    3    X7            0.8718      0.8570     26.9889    59.5306    0.5934    
##    4    X6            0.8932      0.8761     20.8073    56.0486    0.5523    
##    5    X8            0.9057      0.8860     18.0270    54.3108    0.5297    
##    6    X9            0.9428      0.9279      5.8470    41.3046    0.4213    
## -------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_rm4_log)
## Forward Selection Method 
## ------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Entered: 
## 
## - X1 
## - X3 
## - X7 
## - X6 
## - X8 
## - X9 
## 
## No more variables to be added.
## 
##                        Selection Summary                        
## ---------------------------------------------------------------
## Variable      AIC      Sum Sq     RSS       R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## X1           94.713    37.594    33.799    0.52658      0.50967 
## X3           68.999    57.974    13.418    0.81205      0.79813 
## X7           59.531    62.237     9.155    0.87176      0.85696 
## X6           56.049    63.766     7.626    0.89318      0.87609 
## X8           54.311    64.660     6.733    0.90569      0.88604 
## X9           41.305    67.310     4.083    0.94281      0.92789 
## ---------------------------------------------------------------

Stepwise Forward Regression for X1 eliminated model

# Stepwise Forward Regression based on p values (use α=0.15) #
ols_step_forward_p(model_wf_rm1_log, penter = 0.15)
## Forward Selection Method    
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X2 
## 2. X3 
## 3. X4 
## 4. X5 
## 5. X6 
## 6. X7 
## 7. X8 
## 8. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered: 
## 
## - X4 
## - X3 
## - X7 
## 
## No more variables to be added.
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.944       RMSE               0.549 
## R-Squared               0.890       Coef. Var          8.618 
## Adj. R-Squared          0.878       MSE                0.301 
## Pred R-Squared          0.854       MAE                0.414 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     63.565         3         21.188    70.378    0.0000 
## Residual        7.828        26          0.301                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                  Parameter Estimates                                  
## -------------------------------------------------------------------------------------
##       model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## -------------------------------------------------------------------------------------
## (Intercept)    2.872         0.547                 5.254    0.000     1.748    3.995 
##          X4    0.122         0.033        0.559    3.730    0.001     0.055    0.189 
##          X3    0.168         0.040        0.435    4.165    0.000     0.085    0.251 
##          X7    3.106         1.537        0.309    2.021    0.054    -0.053    6.266 
## -------------------------------------------------------------------------------------
## 
##                            Selection Summary                             
## ------------------------------------------------------------------------
##         Variable                  Adj.                                      
## Step    Entered     R-Square    R-Square     C(p)        AIC       RMSE     
## ------------------------------------------------------------------------
##    1    X4            0.8030      0.7960    52.5895    68.4060    0.7087    
##    2    X3            0.8731      0.8637    26.6181    57.2082    0.5792    
##    3    X7            0.8904      0.8777    21.7454    54.8305    0.5487    
## ------------------------------------------------------------------------
# Stepwise AIC Forward Regression #
ols_step_forward_aic(model_wf_rm1_log)
## Forward Selection Method 
## ------------------------
## 
## Candidate Terms: 
## 
## 1 . X2 
## 2 . X3 
## 3 . X4 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Entered: 
## 
## - X4 
## - X3 
## - X7 
## - X8 
## - X9 
## - X6 
## 
## No more variables to be added.
## 
##                        Selection Summary                        
## ---------------------------------------------------------------
## Variable      AIC      Sum Sq     RSS       R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## X4           68.406    57.330    14.063    0.80302      0.79599 
## X3           57.208    62.335     9.057    0.87313      0.86373 
## X7           54.830    63.565     7.828    0.89036      0.87771 
## X8           54.522    64.144     7.248    0.89848      0.88223 
## X9           44.504    66.537     4.856    0.93199      0.91782 
## X6           39.161    67.591     3.801    0.94675      0.93286 
## ---------------------------------------------------------------

(d) (5) Backward selection

Stepwise Backward Regression for full model

# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_full_log, penter = 0.05)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X4 
## 5 . X5 
## 6 . X6 
## 7 . X7 
## 8 . X8 
## 9 . X9 
## 
## We are eliminating variables based on p value...
## 
## Variables Removed: 
## 
## - X1 
## - X2 
## - X5 
## 
## No more variables satisfy the condition of p value = 0.3
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.973       RMSE               0.407 
## R-Squared               0.947       Coef. Var          6.385 
## Adj. R-Squared          0.933       MSE                0.165 
## Pred R-Squared          0.908       MAE                0.273 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.591         6         11.265     68.16    0.0000 
## Residual        3.801        23          0.165                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.692         0.445                  6.046    0.000     1.771     3.613 
##          X3     0.184         0.032        0.476     5.698    0.000     0.117     0.251 
##          X4     0.109         0.026        0.499     4.244    0.000     0.056     0.162 
##          X6    -0.368         0.146       -0.133    -2.526    0.019    -0.669    -0.066 
##          X7     4.085         1.213        0.406     3.367    0.003     1.575     6.595 
##          X8     0.612         0.133        0.493     4.614    0.000     0.337     0.886 
##          X9    -0.448         0.108       -0.450    -4.135    0.000    -0.672    -0.224 
## ----------------------------------------------------------------------------------------
## 
## 
##                           Elimination Summary                           
## -----------------------------------------------------------------------
##         Variable                  Adj.                                     
## Step    Removed     R-Square    R-Square     C(p)       AIC       RMSE     
## -----------------------------------------------------------------------
##    1    X1            0.9474      0.9273    8.0021    42.8146    0.4230    
##    2    X2            0.9472      0.9304    6.0604    40.9019    0.4139    
##    3    X5            0.9468      0.9329    4.2345    39.1611    0.4065    
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_full_log)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X4 
## 5 . X5 
## 6 . X6 
## 7 . X7 
## 8 . X8 
## 9 . X9 
## 
## 
## Variables Removed: 
## 
## - X1 
## - X2 
## - X5 
## 
## No more variables to be removed.
## 
## 
##                   Backward Elimination Summary                   
## ---------------------------------------------------------------
## Variable       AIC       RSS     Sum Sq     R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## Full Model    44.811    3.757    67.635    0.94737      0.92369 
## X1            42.815    3.758    67.635    0.94737      0.92731 
## X2            40.902    3.769    67.624    0.94721      0.93042 
## X5            39.161    3.801    67.591    0.94675      0.93286 
## ---------------------------------------------------------------

Stepwise Backward Regression for X4 eliminated model

# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_rm4_log, penter = 0.05)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## We are eliminating variables based on p value...
## 
## Variables Removed: 
## 
## - X5 
## - X2 
## 
## No more variables satisfy the condition of p value = 0.3
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.971       RMSE               0.421 
## R-Squared               0.943       Coef. Var          6.618 
## Adj. R-Squared          0.928       MSE                0.178 
## Pred R-Squared          0.900       MAE                0.292 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.310         6         11.218    63.195    0.0000 
## Residual        4.083        23          0.178                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.307         0.410                  5.623    0.000     1.458     3.156 
##          X1     0.207         0.053        0.368     3.897    0.001     0.097     0.317 
##          X3     0.263         0.022        0.680    11.944    0.000     0.217     0.308 
##          X6    -0.532         0.144       -0.192    -3.688    0.001    -0.831    -0.234 
##          X7     5.453         1.002        0.542     5.442    0.000     3.380     7.525 
##          X8     0.613         0.137        0.495     4.462    0.000     0.329     0.897 
##          X9    -0.433         0.112       -0.435    -3.864    0.001    -0.665    -0.201 
## ----------------------------------------------------------------------------------------
## 
## 
##                           Elimination Summary                           
## -----------------------------------------------------------------------
##         Variable                  Adj.                                     
## Step    Removed     R-Square    R-Square     C(p)       AIC       RMSE     
## -----------------------------------------------------------------------
##    1    X5            0.9444      0.9267    7.2445    42.4657    0.4248    
##    2    X2            0.9428      0.9279    5.8470    41.3046    0.4213    
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_rm4_log)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Removed: 
## 
## - X5 
## - X2 
## 
## No more variables to be removed.
## 
## 
##                   Backward Elimination Summary                   
## ---------------------------------------------------------------
## Variable       AIC       RSS     Sum Sq     R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## Full Model    44.118    3.925    67.468    0.94503      0.92409 
## X5            42.466    3.970    67.422    0.94439      0.92669 
## X2            41.305    4.083    67.310    0.94281      0.92789 
## ---------------------------------------------------------------

Stepwise Backward Regression for X1 eliminated model

# Stepwise Backward Regression based on p values (use α=0.05) #
ols_step_backward_p(model_wf_rm1_log, penter = 0.05)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X2 
## 2 . X3 
## 3 . X4 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## We are eliminating variables based on p value...
## 
## Variables Removed: 
## 
## - X2 
## - X5 
## 
## No more variables satisfy the condition of p value = 0.3
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.973       RMSE               0.407 
## R-Squared               0.947       Coef. Var          6.385 
## Adj. R-Squared          0.933       MSE                0.165 
## Pred R-Squared          0.908       MAE                0.273 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.591         6         11.265     68.16    0.0000 
## Residual        3.801        23          0.165                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.692         0.445                  6.046    0.000     1.771     3.613 
##          X3     0.184         0.032        0.476     5.698    0.000     0.117     0.251 
##          X4     0.109         0.026        0.499     4.244    0.000     0.056     0.162 
##          X6    -0.368         0.146       -0.133    -2.526    0.019    -0.669    -0.066 
##          X7     4.085         1.213        0.406     3.367    0.003     1.575     6.595 
##          X8     0.612         0.133        0.493     4.614    0.000     0.337     0.886 
##          X9    -0.448         0.108       -0.450    -4.135    0.000    -0.672    -0.224 
## ----------------------------------------------------------------------------------------
## 
## 
##                           Elimination Summary                           
## -----------------------------------------------------------------------
##         Variable                  Adj.                                     
## Step    Removed     R-Square    R-Square     C(p)       AIC       RMSE     
## -----------------------------------------------------------------------
##    1    X2            0.9472      0.9304    7.0612    40.9019    0.4139    
##    2    X5            0.9468      0.9329    5.2440    39.1611    0.4065    
## -----------------------------------------------------------------------
# Stepwise AIC Backward Regression #
ols_step_backward_aic(model_wf_rm1_log)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1 . X2 
## 2 . X3 
## 3 . X4 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Removed: 
## 
## - X2 
## - X5 
## 
## No more variables to be removed.
## 
## 
##                   Backward Elimination Summary                   
## ---------------------------------------------------------------
## Variable       AIC       RSS     Sum Sq     R-Sq      Adj. R-Sq 
## ---------------------------------------------------------------
## Full Model    42.815    3.758    67.635    0.94737      0.92731 
## X2            40.902    3.769    67.624    0.94721      0.93042 
## X5            39.161    3.801    67.591    0.94675      0.93286 
## ---------------------------------------------------------------

(d) (6) Best Subset Regression

##          Best Subsets Regression         
## -----------------------------------------
## Model Index    Predictors
## -----------------------------------------
##      1         X4                         
##      2         X3 X4                      
##      3         X3 X4 X7                   
##      4         X1 X4 X8 X9                
##      5         X3 X4 X7 X8 X9             
##      6         X3 X4 X6 X7 X8 X9          
##      7         X3 X4 X5 X6 X7 X8 X9       
##      8         X2 X3 X4 X5 X6 X7 X8 X9    
##      9         X1 X2 X3 X4 X5 X6 X7 X8 X9 
## -----------------------------------------
## 
##                                                   Subsets Regression Summary                                                  
## ------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                        
## Model    R-Square    R-Square    R-Square     C(p)        AIC        SBIC        SBC       MSEP      FPE       HSP       APC  
## ------------------------------------------------------------------------------------------------------------------------------
##   1        0.8030      0.7960      0.7717    48.8552    68.4060    -19.8453    72.6096    0.5382    0.5357    0.0186    0.2251 
##   2        0.8731      0.8637      0.8435    24.2129    57.2082    -30.4801    62.8130    0.3733    0.3690    0.0129    0.1551 
##   3        0.8904      0.8777      0.8539    19.6668    54.8305    -32.7027    61.8365    0.3484    0.3412    0.0120    0.1434 
##   4        0.9209      0.9082      0.8862    10.0581    47.0333    -38.1223    55.4405    0.2723    0.2635    0.0094    0.1107 
##   5        0.9320      0.9178      0.8917     7.8458    44.5038    -38.7554    54.3122    0.2545    0.2428    0.0088    0.1020 
##   6        0.9468      0.9329      0.9084     4.2345    39.1611    -39.6845    50.3706    0.2174    0.2038    0.0075    0.0857 
##   7        0.9472      0.9304      0.9021     6.0604    40.9019    -36.7978    53.5126    0.2360    0.2170    0.0082    0.0912 
##   8        0.9474      0.9273      0.8957     8.0021    42.8146    -33.8243    56.8265    0.2589    0.2326    0.0089    0.0977 
##   9        0.9474      0.9237       0.886    10.0000    44.8113    -30.8250    60.2245    0.2861    0.2505    0.0099    0.1053 
## ------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria

##        Best Subsets Regression        
## --------------------------------------
## Model Index    Predictors
## --------------------------------------
##      1         X1                      
##      2         X3 X7                   
##      3         X1 X3 X7                
##      4         X1 X3 X6 X7             
##      5         X1 X3 X7 X8 X9          
##      6         X1 X3 X6 X7 X8 X9       
##      7         X1 X2 X3 X6 X7 X8 X9    
##      8         X1 X2 X3 X5 X6 X7 X8 X9 
## --------------------------------------
## 
##                                                   Subsets Regression Summary                                                   
## -------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                         
## Model    R-Square    R-Square    R-Square      C(p)        AIC        SBIC        SBC       MSEP      FPE       HSP       APC  
## -------------------------------------------------------------------------------------------------------------------------------
##   1        0.5266      0.5097      0.4658    154.8516    94.7131      4.8488    98.9167    1.2935    1.2876    0.0447    0.5411 
##   2        0.8317      0.8192      0.7983     40.2985    65.6888    -23.2171    71.2936    0.4953    0.4896    0.0171    0.2057 
##   3        0.8718      0.8570      0.8308     26.9889    59.5306    -29.0072    66.5365    0.4075    0.3991    0.0141    0.1677 
##   4        0.8932      0.8761      0.8454     20.8073    56.0486    -31.8764    64.4557    0.3678    0.3559    0.0127    0.1496 
##   5        0.9090      0.8900      0.8567     16.7678    53.2435    -33.5760    63.0518    0.3406    0.3249    0.0118    0.1365 
##   6        0.9428      0.9279      0.9003      5.8470    41.3046    -38.8856    52.5142    0.2335    0.2189    0.0081    0.0920 
##   7        0.9444      0.9267      0.8975      7.2445    42.4657    -36.4162    55.0764    0.2486    0.2286    0.0086    0.0961 
##   8        0.9450      0.9241      0.8902      9.0000    44.1184    -33.6709    58.1304    0.2704    0.2430    0.0093    0.1021 
## -------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria

##        Best Subsets Regression        
## --------------------------------------
## Model Index    Predictors
## --------------------------------------
##      1         X4                      
##      2         X3 X4                   
##      3         X3 X4 X7                
##      4         X3 X4 X8 X9             
##      5         X3 X4 X7 X8 X9          
##      6         X3 X4 X6 X7 X8 X9       
##      7         X3 X4 X5 X6 X7 X8 X9    
##      8         X2 X3 X4 X5 X6 X7 X8 X9 
## --------------------------------------
## 
##                                                   Subsets Regression Summary                                                  
## ------------------------------------------------------------------------------------------------------------------------------
##                        Adj.        Pred                                                                                        
## Model    R-Square    R-Square    R-Square     C(p)        AIC        SBIC        SBC       MSEP      FPE       HSP       APC  
## ------------------------------------------------------------------------------------------------------------------------------
##   1        0.8030      0.7960      0.7717    52.5895    68.4060    -19.9679    72.6096    0.5382    0.5357    0.0186    0.2251 
##   2        0.8731      0.8637      0.8435    26.6181    57.2082    -30.7039    62.8130    0.3733    0.3690    0.0129    0.1551 
##   3        0.8904      0.8777      0.8539    21.7454    54.8305    -33.0170    61.8365    0.3484    0.3412    0.0120    0.1434 
##   4        0.9156      0.9020      0.8789    13.6900    48.9949    -37.2607    57.4021    0.2907    0.2813    0.0100    0.1182 
##   5        0.9320      0.9178      0.8917     9.1352    44.5038    -39.3879    54.3122    0.2545    0.2428    0.0088    0.1020 
##   6        0.9468      0.9329      0.9084     5.2440    39.1611    -40.5447    50.3706    0.2174    0.2038    0.0075    0.0857 
##   7        0.9472      0.9304      0.9021     7.0612    40.9019    -37.8040    53.5126    0.2360    0.2170    0.0082    0.0912 
##   8        0.9474      0.9273      0.8957     9.0000    42.8146    -34.9748    56.8265    0.2589    0.2326    0.0089    0.0977 
## ------------------------------------------------------------------------------------------------------------------------------
## AIC: Akaike Information Criteria 
##  SBIC: Sawa's Bayesian Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
##  MSEP: Estimated error of prediction, assuming multivariate normality 
##  FPE: Final Prediction Error 
##  HSP: Hocking's Sp 
##  APC: Amemiya Prediction Criteria

(d) (7) Models Comparison

  • Additional Regression
## # A tibble: 511 x 6
##    Index     N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
##  * <int> <int> <chr>           <dbl>           <dbl>         <dbl>
##  1     1     1 X4            0.803           0.796            48.9
##  2     2     1 X1            0.527           0.510           154. 
##  3     3     1 X5            0.523           0.506           155. 
##  4     4     1 X2            0.433           0.413           189. 
##  5     5     1 X7            0.350           0.327           221. 
##  6     6     1 X3            0.227           0.199           268. 
##  7     7     1 X8            0.0407          0.00648         339. 
##  8     8     1 X9            0.0176         -0.0175          347. 
##  9     9     1 X6            0.00292        -0.0327          353. 
## 10    10     2 X3 X4         0.873           0.864            24.2
## # ... with 501 more rows

## # A tibble: 255 x 6
##    Index     N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
##  * <int> <int> <chr>           <dbl>           <dbl>         <dbl>
##  1     1     1 X1            0.527           0.510           155. 
##  2     2     1 X5            0.523           0.506           156. 
##  3     3     1 X2            0.433           0.413           190. 
##  4     4     1 X7            0.350           0.327           222. 
##  5     5     1 X3            0.227           0.199           269. 
##  6     6     1 X8            0.0407          0.00648         340. 
##  7     7     1 X9            0.0176         -0.0175          349. 
##  8     8     1 X6            0.00292        -0.0327          355. 
##  9     9     2 X3 X7         0.832           0.819            40.3
## 10    10     2 X1 X3         0.812           0.798            47.8
## # ... with 245 more rows

## # A tibble: 255 x 6
##    Index     N Predictors `R-Square` `Adj. R-Square` `Mallow's Cp`
##  * <int> <int> <chr>           <dbl>           <dbl>         <dbl>
##  1     1     1 X4            0.803           0.796            52.6
##  2     2     1 X5            0.523           0.506           164. 
##  3     3     1 X2            0.433           0.413           200. 
##  4     4     1 X7            0.350           0.327           233. 
##  5     5     1 X3            0.227           0.199           283. 
##  6     6     1 X8            0.0407          0.00648         357. 
##  7     7     1 X9            0.0176         -0.0175          366. 
##  8     8     1 X6            0.00292        -0.0327          372. 
##  9     9     2 X3 X4         0.873           0.864            26.6
## 10    10     2 X3 X7         0.832           0.819            43.2
## # ... with 245 more rows

## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X1 
## 2. X2 
## 3. X3 
## 4. X4 
## 5. X5 
## 6. X6 
## 7. X7 
## 8. X8 
## 9. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - X4 added 
## - X3 added 
## - X7 added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.944       RMSE               0.549 
## R-Squared               0.890       Coef. Var          8.618 
## Adj. R-Squared          0.878       MSE                0.301 
## Pred R-Squared          0.854       MAE                0.414 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     63.565         3         21.188    70.378    0.0000 
## Residual        7.828        26          0.301                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                  Parameter Estimates                                  
## -------------------------------------------------------------------------------------
##       model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## -------------------------------------------------------------------------------------
## (Intercept)    2.872         0.547                 5.254    0.000     1.748    3.995 
##          X4    0.122         0.033        0.559    3.730    0.001     0.055    0.189 
##          X3    0.168         0.040        0.435    4.165    0.000     0.085    0.251 
##          X7    3.106         1.537        0.309    2.021    0.054    -0.053    6.266 
## -------------------------------------------------------------------------------------
## 
##                              Stepwise Selection Summary                              
## ------------------------------------------------------------------------------------
##                      Added/                   Adj.                                      
## Step    Variable    Removed     R-Square    R-Square     C(p)        AIC       RMSE     
## ------------------------------------------------------------------------------------
##    1       X4       addition       0.803       0.796    48.8550    68.4060    0.7087    
##    2       X3       addition       0.873       0.864    24.2130    57.2082    0.5792    
##    3       X7       addition       0.890       0.878    19.6670    54.8305    0.5487    
## ------------------------------------------------------------------------------------

## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X4 
## 5 . X5 
## 6 . X6 
## 7 . X7 
## 8 . X8 
## 9 . X9 
## 
## 
## Variables Entered/Removed: 
## 
## - X4 added 
## - X3 added 
## - X7 added 
## - X8 added 
## - X9 added 
## - X6 added 
## 
## No more variables to be added or removed.
## 
## 
##                               Stepwise Summary                              
## --------------------------------------------------------------------------
## Variable     Method      AIC       RSS      Sum Sq     R-Sq      Adj. R-Sq 
## --------------------------------------------------------------------------
## X4          addition    68.406    14.063    57.330    0.80302      0.79599 
## X3          addition    57.208     9.057    62.335    0.87313      0.86373 
## X7          addition    54.830     7.828    63.565    0.89036      0.87771 
## X8          addition    54.522     7.248    64.144    0.89848      0.88223 
## X9          addition    44.504     4.856    66.537    0.93199      0.91782 
## X6          addition    39.161     3.801    67.591    0.94675      0.93286 
## --------------------------------------------------------------------------

## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X1 
## 2. X2 
## 3. X3 
## 4. X5 
## 5. X6 
## 6. X7 
## 7. X8 
## 8. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - X1 added 
## - X3 added 
## - X7 added 
## - X6 added 
## - X8 added 
## - X9 added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.971       RMSE               0.421 
## R-Squared               0.943       Coef. Var          6.618 
## Adj. R-Squared          0.928       MSE                0.178 
## Pred R-Squared          0.900       MAE                0.292 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.310         6         11.218    63.195    0.0000 
## Residual        4.083        23          0.178                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.307         0.410                  5.623    0.000     1.458     3.156 
##          X1     0.207         0.053        0.368     3.897    0.001     0.097     0.317 
##          X3     0.263         0.022        0.680    11.944    0.000     0.217     0.308 
##          X7     5.453         1.002        0.542     5.442    0.000     3.380     7.525 
##          X6    -0.532         0.144       -0.192    -3.688    0.001    -0.831    -0.234 
##          X8     0.613         0.137        0.495     4.462    0.000     0.329     0.897 
##          X9    -0.433         0.112       -0.435    -3.864    0.001    -0.665    -0.201 
## ----------------------------------------------------------------------------------------
## 
##                              Stepwise Selection Summary                               
## -------------------------------------------------------------------------------------
##                      Added/                   Adj.                                       
## Step    Variable    Removed     R-Square    R-Square      C(p)        AIC       RMSE     
## -------------------------------------------------------------------------------------
##    1       X1       addition       0.527       0.510    154.8520    94.7131    1.0987    
##    2       X3       addition       0.812       0.798     47.7990    68.9988    0.7050    
##    3       X7       addition       0.872       0.857     26.9890    59.5306    0.5934    
##    4       X6       addition       0.893       0.876     20.8070    56.0486    0.5523    
##    5       X8       addition       0.906       0.886     18.0270    54.3108    0.5297    
##    6       X9       addition       0.943       0.928      5.8470    41.3046    0.4213    
## -------------------------------------------------------------------------------------

## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1 . X1 
## 2 . X2 
## 3 . X3 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Entered/Removed: 
## 
## - X1 added 
## - X3 added 
## - X7 added 
## - X6 added 
## - X8 added 
## - X9 added 
## 
## No more variables to be added or removed.
## 
## 
##                               Stepwise Summary                              
## --------------------------------------------------------------------------
## Variable     Method      AIC       RSS      Sum Sq     R-Sq      Adj. R-Sq 
## --------------------------------------------------------------------------
## X1          addition    94.713    33.799    37.594    0.52658      0.50967 
## X3          addition    68.999    13.418    57.974    0.81205      0.79813 
## X7          addition    59.531     9.155    62.237    0.87176      0.85696 
## X6          addition    56.049     7.626    63.766    0.89318      0.87609 
## X8          addition    54.311     6.733    64.660    0.90569      0.88604 
## X9          addition    41.305     4.083    67.310    0.94281      0.92789 
## --------------------------------------------------------------------------

## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. X2 
## 2. X3 
## 3. X4 
## 4. X5 
## 5. X6 
## 6. X7 
## 7. X8 
## 8. X9 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - X4 added 
## - X3 added 
## - X7 added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.944       RMSE               0.549 
## R-Squared               0.890       Coef. Var          8.618 
## Adj. R-Squared          0.878       MSE                0.301 
## Pred R-Squared          0.854       MAE                0.414 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     63.565         3         21.188    70.378    0.0000 
## Residual        7.828        26          0.301                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                  Parameter Estimates                                  
## -------------------------------------------------------------------------------------
##       model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## -------------------------------------------------------------------------------------
## (Intercept)    2.872         0.547                 5.254    0.000     1.748    3.995 
##          X4    0.122         0.033        0.559    3.730    0.001     0.055    0.189 
##          X3    0.168         0.040        0.435    4.165    0.000     0.085    0.251 
##          X7    3.106         1.537        0.309    2.021    0.054    -0.053    6.266 
## -------------------------------------------------------------------------------------
## 
##                              Stepwise Selection Summary                              
## ------------------------------------------------------------------------------------
##                      Added/                   Adj.                                      
## Step    Variable    Removed     R-Square    R-Square     C(p)        AIC       RMSE     
## ------------------------------------------------------------------------------------
##    1       X4       addition       0.803       0.796    52.5890    68.4060    0.7087    
##    2       X3       addition       0.873       0.864    26.6180    57.2082    0.5792    
##    3       X7       addition       0.890       0.878    21.7450    54.8305    0.5487    
## ------------------------------------------------------------------------------------

## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1 . X2 
## 2 . X3 
## 3 . X4 
## 4 . X5 
## 5 . X6 
## 6 . X7 
## 7 . X8 
## 8 . X9 
## 
## 
## Variables Entered/Removed: 
## 
## - X4 added 
## - X3 added 
## - X7 added 
## - X8 added 
## - X9 added 
## - X6 added 
## 
## No more variables to be added or removed.
## 
## 
##                               Stepwise Summary                              
## --------------------------------------------------------------------------
## Variable     Method      AIC       RSS      Sum Sq     R-Sq      Adj. R-Sq 
## --------------------------------------------------------------------------
## X4          addition    68.406    14.063    57.330    0.80302      0.79599 
## X3          addition    57.208     9.057    62.335    0.87313      0.86373 
## X7          addition    54.830     7.828    63.565    0.89036      0.87771 
## X8          addition    54.522     7.248    64.144    0.89848      0.88223 
## X9          addition    44.504     4.856    66.537    0.93199      0.91782 
## X6          addition    39.161     3.801    67.591    0.94675      0.93286 
## --------------------------------------------------------------------------

  • Model 437896
# build model 437896
model_wf_437896_log <- lm(log(y) ~ X4 + X3 + X7 + X8 + X9 + X6, data=table_wf)
ols_regress(model_wf_437896_log)
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.973       RMSE               0.407 
## R-Squared               0.947       Coef. Var          6.385 
## Adj. R-Squared          0.933       MSE                0.165 
## Pred R-Squared          0.908       MAE                0.273 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.591         6         11.265     68.16    0.0000 
## Residual        3.801        23          0.165                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.692         0.445                  6.046    0.000     1.771     3.613 
##          X4     0.109         0.026        0.499     4.244    0.000     0.056     0.162 
##          X3     0.184         0.032        0.476     5.698    0.000     0.117     0.251 
##          X7     4.085         1.213        0.406     3.367    0.003     1.575     6.595 
##          X8     0.612         0.133        0.493     4.614    0.000     0.337     0.886 
##          X9    -0.448         0.108       -0.450    -4.135    0.000    -0.672    -0.224 
##          X6    -0.368         0.146       -0.133    -2.526    0.019    -0.669    -0.066 
## ----------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_437896_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 6 x 3
##   Variables Tolerance   VIF
##   <chr>         <dbl> <dbl>
## 1 X4            0.167  5.97
## 2 X3            0.332  3.01
## 3 X7            0.159  6.28
## 4 X8            0.202  4.94
## 5 X9            0.195  5.12
## 6 X6            0.839  1.19
## 
## 
## Eigenvalue and Condition Index
## ------------------------------
##   Eigenvalue Condition Index    intercept           X4          X3           X7           X8           X9         X6
## 1 6.06603799        1.000000 0.0007146972 0.0012879224 0.001577168 6.255392e-04 0.0008014993 0.0009066212 0.00304547
## 2 0.33834763        4.234196 0.0025641623 0.1040613624 0.005790171 1.173277e-02 0.0084021537 0.0065708241 0.02244379
## 3 0.31443225        4.392270 0.0007827736 0.0005853923 0.117541638 3.247682e-03 0.0152300675 0.0281423928 0.02635038
## 4 0.18092020        5.790406 0.0043202653 0.0171208486 0.087238632 1.883680e-02 0.0099587626 0.0185479834 0.30810079
## 5 0.07065103        9.266022 0.1767257312 0.0717821748 0.003008880 4.520519e-02 0.0114424989 0.0089921454 0.54826833
## 6 0.01847255       18.121293 0.0001449205 0.0053427347 0.008960972 7.477679e-06 0.9456383423 0.9366327464 0.03064530
## 7 0.01113836       23.336833 0.8147474499 0.7998195649 0.775882539 9.203445e-01 0.0085266757 0.0002072867 0.06114594
#Model Fit Assessment
ols_plot_diagnostics(model_wf_437896_log)

# Part & Partial Correlations
ols_test_correlation(model_wf_437896_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9837263
# Residual Normality Test
ols_test_normality(model_wf_437896_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
##        Test             Statistic       pvalue  
## -----------------------------------------------
## Shapiro-Wilk              0.9728         0.6175 
## Kolmogorov-Smirnov        0.0997         0.8982 
## Cramer-von Mises          4.8429         0.0000 
## Anderson-Darling          0.2996         0.5612 
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_437896_log)

# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_437896_log)

  • Model 437
# build model 437
model_wf_437_log <- lm(log(y) ~ X4 + X3 + X7, data=table_wf)
ols_regress(model_wf_437_log)
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.944       RMSE               0.549 
## R-Squared               0.890       Coef. Var          8.618 
## Adj. R-Squared          0.878       MSE                0.301 
## Pred R-Squared          0.854       MAE                0.414 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     63.565         3         21.188    70.378    0.0000 
## Residual        7.828        26          0.301                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                  Parameter Estimates                                  
## -------------------------------------------------------------------------------------
##       model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## -------------------------------------------------------------------------------------
## (Intercept)    2.872         0.547                 5.254    0.000     1.748    3.995 
##          X4    0.122         0.033        0.559    3.730    0.001     0.055    0.189 
##          X3    0.168         0.040        0.435    4.165    0.000     0.085    0.251 
##          X7    3.106         1.537        0.309    2.021    0.054    -0.053    6.266 
## -------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_437_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 3 x 3
##   Variables Tolerance   VIF
##   <chr>         <dbl> <dbl>
## 1 X4            0.188  5.32
## 2 X3            0.386  2.59
## 3 X7            0.181  5.53
## 
## 
## Eigenvalue and Condition Index
## ------------------------------
##   Eigenvalue Condition Index   intercept          X4          X3          X7
## 1 3.51967088        1.000000 0.002498604 0.004729201 0.005661232 0.002143973
## 2 0.30192504        3.414298 0.006749341 0.054452269 0.142053567 0.018736742
## 3 0.16605489        4.603893 0.068513979 0.150366864 0.078164912 0.024726360
## 4 0.01234919       16.882304 0.922238076 0.790451666 0.774120289 0.954392926
#Model Fit Assessment
ols_plot_diagnostics(model_wf_437_log)

# Part & Partial Correlations
ols_test_correlation(model_wf_437_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.9856766
# Residual Normality Test
ols_test_normality(model_wf_437_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
##        Test             Statistic       pvalue  
## -----------------------------------------------
## Shapiro-Wilk              0.9765         0.7267 
## Kolmogorov-Smirnov        0.1033         0.8736 
## Cramer-von Mises          3.1908         0.0000 
## Anderson-Darling          0.3511         0.4469 
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_437_log)

# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_437_log)

  • Model 137689
# build model 137689
model_wf_137689_log <- lm(log(y) ~ X1 + X3 + X7 + X6 + X8 + X9, data=table_wf)
ols_regress(model_wf_137689_log)
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.971       RMSE               0.421 
## R-Squared               0.943       Coef. Var          6.618 
## Adj. R-Squared          0.928       MSE                0.178 
## Pred R-Squared          0.900       MAE                0.292 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression     67.310         6         11.218    63.195    0.0000 
## Residual        4.083        23          0.178                     
## Total          71.393        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)     2.307         0.410                  5.623    0.000     1.458     3.156 
##          X1     0.207         0.053        0.368     3.897    0.001     0.097     0.317 
##          X3     0.263         0.022        0.680    11.944    0.000     0.217     0.308 
##          X7     5.453         1.002        0.542     5.442    0.000     3.380     7.525 
##          X6    -0.532         0.144       -0.192    -3.688    0.001    -0.831    -0.234 
##          X8     0.613         0.137        0.495     4.462    0.000     0.329     0.897 
##          X9    -0.433         0.112       -0.435    -3.864    0.001    -0.665    -0.201 
## ----------------------------------------------------------------------------------------
# Collinearity Diagnostics #
ols_coll_diag(model_wf_137689_log)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 6 x 3
##   Variables Tolerance   VIF
##   <chr>         <dbl> <dbl>
## 1 X1            0.279  3.58
## 2 X3            0.768  1.30
## 3 X7            0.251  3.99
## 4 X6            0.917  1.09
## 5 X8            0.202  4.94
## 6 X9            0.196  5.10
## 
## 
## Eigenvalue and Condition Index
## ------------------------------
##   Eigenvalue Condition Index    intercept          X1          X3           X7          X6           X8           X9
## 1 5.87754632        1.000000 0.0009583976 0.002753729 0.003752062 0.0010622321 0.003554176 0.0008552526 0.0009724289
## 2 0.55370282        3.258064 0.0022409295 0.184392777 0.048522556 0.0065506985 0.008922660 0.0011708384 0.0004973735
## 3 0.29276462        4.480626 0.0006141795 0.021479187 0.183073381 0.0001146981 0.036353961 0.0263007081 0.0424526993
## 4 0.15641976        6.129884 0.0050342867 0.054272760 0.378429650 0.0153101550 0.417215036 0.0036091737 0.0092883543
## 5 0.08151723        8.491283 0.1288772252 0.136506302 0.004864520 0.1464814595 0.482676303 0.0146393598 0.0085093969
## 6 0.02005845       17.117856 0.6268955520 0.471310996 0.317220373 0.5780521912 0.030882409 0.1844189407 0.2640851038
## 7 0.01799079       18.074773 0.2353794295 0.129284248 0.064137458 0.2524285656 0.020395454 0.7690057267 0.6741946434
#Model Fit Assessment
ols_plot_diagnostics(model_wf_137689_log)

# Part & Partial Correlations
ols_test_correlation(model_wf_137689_log) # Correlation between observed residuals and expected residuals under normality.
## [1] 0.988106
# Residual Normality Test
ols_test_normality(model_wf_137689_log) # Test for detecting violation of normality assumption. #If p-value is bigger, then no problem of non-normality #
## -----------------------------------------------
##        Test             Statistic       pvalue  
## -----------------------------------------------
## Shapiro-Wilk              0.9769         0.7382 
## Kolmogorov-Smirnov        0.0771         0.9881 
## Cramer-von Mises          4.4689         0.0000 
## Anderson-Darling          0.1644         0.9350 
## -----------------------------------------------
# Variable Contributions
ols_plot_added_variable(model_wf_137689_log)

# Residual Plus Component Plot
ols_plot_comp_plus_resid(model_wf_137689_log)

Final models

#Lack of Fit F Test
ols_pure_error_anova(lm(y~X8, data = table_wf))
## Lack of Fit F Test 
## ---------------
## Response :   y 
## Predictor:   X8 
## 
##                        Analysis of Variance Table                         
## -------------------------------------------------------------------------
##                 DF      Sum Sq        Mean Sq      F Value       Pr(>F)   
## -------------------------------------------------------------------------
## X8               1     4616882.92    4616882.92    5.795558    0.02290414 
## Residual        28    36951252.44    1319687.59                           
##  Lack of fit    21    31374881.28    1494041.97    1.875466     0.2003839 
##  Pure Error      7     5576371.17     796624.45                           
## -------------------------------------------------------------------------
ols_press(model_wf_437896_log)
## [1] 6.538275
ols_press(model_wf_437_log)
## [1] 10.43262
ols_press(model_wf_137689_log)
## [1] 7.114336
# prediction power of 1
1-((ols_press(model_wf_437896_log))/(var(table_wf$y)*(nrow(table_wf)-1)))
## [1] 0.9999998

library(texreg)
# Pretty print regression results on screen
lm(mpg ~ wt, data=my_df) %>% screenreg
texreg::screenreg(l=list(model_2_7_8))

# visualizng
library(GGally)
ggpairs(data=table_b1[c(1,3,8,9)])

# Correlation
cor(table_b1)

# Half correlation matrix:
library(corrr)
mtcars %>% correlate() %>% shave() %>% fashion()
# Visulize correlation matrix:
mtcars %>% correlate() %>% shave() %>% rplot()

# Scatterplot Matrix
mtcars[1:6] %>% plot
# Better looking version
library(ggfortify)
model_2_7_8 %>% autoplot()

# Confidence interval of coefficients
lm(mpg ~ wt + cyl, data=mtcars) %>% confint()

# Hypothesis testing of nested models
lm_mpg_wt <- lm(mpg ~ wt, data=mtcars)
lm_mpg_wt.cyl <- lm(mpg ~ wt + cyl, data=mtcars)
anova(lm_mpg_wt, lm_mpg_wt.cyl)

# convert mpg to kilometers per liter
mtcars %>% mutate(kmpl = mpg * 0.425144) %>% select(mpg, kmpl) %>% filter(mpg > 20)
nrow()
mtcars %>% group_by(am) %>% 
  summarize(n=n(),
            mean_mpg=mean(mpg),
            sd_mpg=sd(mpg),
            min_mpg=min(mpg),
            max_mpg=max(mpg)
mtcars %>% arrange(desc(mpg))

# mean & sd
mtcars %>% summarize(am_mean=mean(am), am_sd=sd(am))
# Frequencies by categories
mtcars %>% group_by(am) %>% tally

# Assume we want to combine LA + SD to Southern CA and Bay Area and Sacramento
## as Northern CA
(californiatod <- californiatod %>% 
  mutate(transit_level=case_when(
    transit>0.4~"high",
    transit>0.2~"medium",
    TRUE ~ "low")))


## General linear F test
fit_R <- lm(mpg ~ wt, data=mtcars)
fit_F <- lm(mpg ~ wt + cyl, data=mtcars)
anova(fit_R, fit_F)

SSE_R <- resid(fit_R)^2 %>% sum
SSE_F <- resid(fit_F)^2 %>% sum
df_R <- df.residual(fit_R)
df_F <- df.residual(fit_F)
F_val <- ((SSE_R - SSE_F)/(df_R - df_F))/(SSE_F/df_F)

# Look up the critical F value for alpha=0.05
alpha <- 0.05
qf(alpha, (df_R - df_F), df_F, lower.tail=F)
# Alternatively, find the p-value corresponding to our F_val
pf(F_val, (df_R - df_F), df_F, lower.tail=F)
n <- nrow(mtcars)          # number of observations
k <- length(coef(fit_R))   # number of coefficients
## Calculate R2 and adjusted R2 manually
TSS <- sd(mtcars$mpg)^2 * (n - 1)
# OR
TSS <- var(mtcars$mpg) * (n - 1)
(R2_R <- 1 - SSE_R/TSS)
(R2_R_adj <- 1 - (SSE_R/(n - k))/(TSS/(n - 1)))

# Interaction Terms
huxreg(
  lm(houseval ~ transit, data=californiatod),
  lm(houseval ~ transit * railtype, data=californiatod),
  lm(houseval ~ transit * region, data=californiatod),
  lm(houseval ~ transit * CA, data=californiatod))

# redefine the region variables with a new reference category (4 for SD)
catod2 <- californiatod %>% mutate(region = relevel(as.factor(region), ref = 4))
lm(houseval ~ region, data=catod2)  %>%  summary

# Partial F test:
catod3 <- californiatod %>% mutate(region = ifelse(region =="LA" | region == "SD", "LA_SD", region))
lm(houseval ~ region, data=catod3)  %>% summary
anova(lm(houseval ~ region, data=catod3), lm(houseval ~ region, data=californiatod))

# Hypothesis testing of linear combination of coefficients
car::lht(model_2_7_8, "x2 = x7")
# Partial F test:H0:β2+β2=0
car::lht(lm(hours ~ married*women, data=chores), "women + married:women = 0")

# linear combination of coefficients
# The point estimate is β2^+β3^ In this case, our linear combination involves the sum rather than the difference between two coefficients, and the formula for estimating the standard error of the sum of two coefficients is:
# $\sqrt{\hat{\sigma^2_{\hat{\beta_2}}} + \hat{\sigma^2_{\hat{\beta_3}}} + 2\hat{cov}_{\hat{\beta_2}\hat{\beta_3}}}$
fit1 <- lm(hours ~ married*women, data=chores)
beta2 <- coef(fit1)["women"]
beta3 <- coef(fit1)["married:women"]
betas_vcov <- vcov(fit1)
se <- sqrt(betas_vcov["women", "women"] + betas_vcov["married:women", "married:women"] + 2 * betas_vcov["women", "married:women"])
(t_stat <- (beta2 + beta3)/se)

## Degrees of Freedom
dof <- fit1$df.residual

## compare t_stat to critical t-value
(t_crit <- qt(0.025, df=dof, lower.tail = F))
## OR find the corresponding p-value
(p_val <- 2 * pt(t_stat, lower.tail = F, df=dof))

# Partial F test on the nonlinear term
anova(lm(houseval ~ density, data=californiatod),lm(houseval ~ density + I(density^2), data=californiatod))
#To be on the safe side, enclose your tranformation in an I() function. This is not necessary for log transformation.

library(olsrr)
# leverage (hat)
leverage <- ols_leverage(lm_sfr)
ols_rsdlev_plot(lm_sfr)
# Cook's distance
ols_cooksd_chart(lm_sfr)
# DFFITS
ols_dffits_plot(lm_sfr)
# DFBETAS
ols_dfbetas_panel(lm_sfr)
# Heteroskedasticity
ols_rvsp_plot(lm_sfr)
ols_rsd_qqplot(lm_sfr)
# hypothesis test of normality of residuals
ols_norm_test(lm_sfr
# Test of Heteroskedasticity with Breusch-Pagan Test
ols_bp_test(lm_sfr)
#Heteroskedasticity-Consistent Standard Errors
# standard variance-covariance matrix
vcov0 <- vcov(lm_sfr)
vcov(model_2_7_8)
# convert to correlation
vcov0
# Heteroskedasticity-Consistent variance covariance matrix
require(car)
vcov_hc3 <- hccm(lm_sfr, type="hc3")
# In presence of Heteroskedasticity, vcov_hc3 is larger than vcov0, to redo hypothesis tests
# with the Heteroskedasticity-Consistent variance covariance matrix
if (!require(lmtest)) install.packages("lmtest") & library(lmtest)
coeftest(lm_sfr, vcov_hc3)
# All possible subset
sfrmodel <- lm(TOTALVAL ~ BLDGSQFT + YEARBUILT + GIS_ACRES + dpioneer + dfwy + dpark + dmax + dbikehq, data = taxlot_sfr)
(sfrmodel_all_subset <- ols_all_subset(sfrmodel))
# Best Subset Regression
ols_best_subset(model_2_7_8)
# Multicollinary with VIF
ols_vif_tol(lm_sfr)
## Stepwise Forward Regression
# based on p-value
(sfrmodel_stepfwd.p <- ols_step_forward(sfrmodel))
# based on AIC
(sfrmodel_stepfwd.aic <- ols_stepaic_forward(sfrmodel))
## Stepwise Backward Regression
# based on p-value
(sfrmodel_stepbwd.p <- ols_step_backward(sfrmodel))
# based on AIC
(sfrmodel_stepbwd.aic <- ols_stepaic_backward(sfrmodel))
## Step AIC regression
# Build regression model from a set of candidate predictor variables by entering and removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to enter or remove any more. The model should include all the candidate predictor variables.
(sfrmodel_stepboth.aic <- ols_stepaic_both(sfrmodel))

# Cross Validation: CV assesses how the results of a model will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
library(modelr)
library(purrr)
(taxlot_sfr_kcv <- taxlot_sfr %>% 
  modelr::crossv_kfold() %>% 
  mutate(model=map(train, ~lm(TOTALVAL~BLDGSQFT+YEARBUILT+GIS_ACRES+dpioneer+dfwy, data=.x)),
         rmse=map2_dbl(model, test, modelr::rmse),
         rsquare=map2_dbl(model, test, modelr::rsquare)))
taxlot_sfr_kcv %>% 
  summarise_at(c("rmse", "rsquare"), funs(mean))

## DID omitted
## Discrete Outcome: Count/Poisson Regression
require(MASS)
require(huxtable)
fit_lm <- lm(carb ~ mpg + qsec, data=mtcars)
fit_glm <- glm(carb ~ mpg + qsec, data=mtcars, family="poisson")
huxreg(OLS=fit_lm, Poisson=fit_glm)

fit_lm <- lm(am ~ qsec + hp, data=mtcars)
fit_glm <- glm(am ~ qsec + hp, data=mtcars, family=binomial("logit"))
huxreg(OLS=fit_lm, logit=fit_glm)

# log Likelihood
logLik(fit_glm)
fit_glm0 <- update(fit_glm, .~1)
logLik(fit_glm0)
## 'log Lik.' -21.61487 (df=1)
# pseudo R2
1 - logLik(fit_glm)/logLik(fit_glm0)
## 'log Lik.' 0.381052 (df=3)
# Interpretation of coefficients
# odds ratio
(odds <- exp(coef(fit_glm)))
#prob
odds/(1 + odds)

huxtable::huxreg(model_2_7_8, statistics = NULL)

library(leaps) # Load the package #
model_wf_subset <- regsubsets(log(y) ~X2 + X3 +X4 + X5 + X6 + X7 + X8 + X9, data=table_wf, nbest=10 ) # nbest is the number of models from each size #
summary(model_wf_subset) # Hard to read output from this #

## plot adjusted R square for each model ##
plot(model_wf_subset, scale='adjr2')
## can use Cp, r2 or bic for scale ##
plot(model_wf_subset, scale='bic')
plot(model_wf_subset, scale='Cp')

shapiro.test(rstudent(model_wf_reduce_log)) #If p-value is bigger, then no problem of non-normality #
shapiro.test(rstudent(model_wf_reduce_log))

table_wf_resi <- table_wf %>% mutate(student_residual=rstudent(model_wf_reduce_log))
ggpairs(data=table_wf_resi[c(10,3,4,6,7,8,9,11)])

table_wf_resi <- table_wf %>% mutate(student_residual=rstudent(model_wf_reduce_log))
ggpairs(data=table_wf_resi[c(10,3,4,7,11)])


Anova(model_wf_final)
vif(model_wf_final)

confint(model_wf_final, level=0.05/1) # Bonferroni joint confidence interval #

plot(model_wf_final, pch=16, col="blue")
#Create Partial Regression plots #
avPlots(model_wf_final)

confint(model_wf_437, level=0.05/1) # Bonferroni joint confidence interval #

plot(model_wf_437, pch=16, col="blue")
#Create Partial Regression plots #
avPlots(model_wf_437)


deviation <- table_wf$y-mean(table_wf$y)

# Predit_Power=1-(PRESS.stat/SST)
1-((MPV::PRESS(model_wf_final))/(deviation%*%deviation)) # Compute SST by multiplying two vectors #
# prediction power of full
1-((MPV::PRESS(model_wf_reduce_log))/(var(table_wf$y)*(nrow(table_wf)-1)))
# prediction power of 437
1-((MPV::PRESS(model_wf_437))/(var(table_wf$y)*(nrow(table_wf)-1)))
# prediction power of backward
1-((MPV::PRESS(model_wf_final))/(var(table_wf$y)*(nrow(table_wf)-1)))